A Julia toolkit for species distribution data

Authors
Affiliation

Timothée Poisot

Université de Montréal

Ariane Bussières-Fournel

Université de Montréal

Gabriel Dansereau

Université de Montréal

Michael D. Catchen

Université de Montréal

Other Formats
Abstract
(1) Species distribution modeling requires to handle varied types of data, and benefits from an integrated approach to programming. (2) We introduce SpeciesDistributionToolkit, a Julia package aiming to facilitate the production of species distribution models. It covers various steps of the data collection and analysis process, extending to the development of interfaces for integration of additional functionalities. (3) By relying on semantic versioning and strong design choices on modularity, we expect that this package will lead to improved reproducibility and long-term maintainability. (4) We illustrate the functionalities of the package through several case studies, accompanied by reproducible code.
Keywords

species distribution models, biogeography, occurrence data, land use, climatic data, pseudo-absences

Introduction

Species Distribution Models [SDMs; Elith and Leathwick (2009)], in addition to being key tools to further our knowledge of biodiversity, are key components of effective conservation decisions (Guisan et al. 2013), planning (McShea 2014), and ecological impact assesment (Baker et al. 2021). The training and evaluation of a SDM is a complex process, with key decisions to make on design and reporting (Zurell et al. 2020). The ability to use the correct data format of representation at of these steps is central to support the correct interpretation of these models (Araújo et al. 2019). This is particularly true since the choice of data source can affect the prediction significantly (Arenas-Castro et al. 2022; Booth 2022), suggesting that there is a need for flexible pipelines in which data sources can be conveniently swapped. In recent years, there has been an increase in the number of software packages and tools to assist ecologists with various steps of the development of species distribution models.

As Kass et al. (2024) point out, this increase in the diversity of software tools (most of them in the R language) is a good thing. Because the SDMs are a general-purpose methodology, a varied software offers increases the chances that specific decisions can be chained together in the way that best support a specific use case. By making code available for all users, package developers reduce the need for custom implementation of analytical steps, and contribute to the adoption of good practices in the field. However, because building, validating, and applying SDMs requires a diversity of data types, from different sources, many existing packages have been designed independently. Therefore, they may suffer from low interoperability, which can create friction when using multiple tools together. As an illustration, Kellner, Doser, and Belant (2025) highlight that, out of publications on abundance or distribution models that share code and data, about 20% are not reproducible because of issues in package dependencies.

To promote interoperability and improve reproductibility, tools that provide an integrated environment are important. In this manuscript, we present SpeciesDistributionToolkit (abbreviated as SDT), a meta-package for the Julia programming language, offering an integrated environment for the retrieval, formatting, and interpretation of data relevant to the modeling of species distributions. SDT was in part designed to work within the BON-in-a-Box project (Gonzalez et al. 2023; Griffith et al. 2024), a GEO BON initiative to facilitate the calculation and reporting of biodiversity indicators supporting the Kunming-Montréal Global Biodiversity Framework. A leading design consideration for SDT was therefore to maximize interoperability between components and functionalities from the ground up. This is achieved through three mechanisms. First, by relying on strict semantic versioning: package releases provide information about the compatibility of existing code. Second, through the use of interfaces: separate software components (including ones external to the package) can interact without prior knowledge of either implementation, and without dependencies between the components of SDT. Finally, through the use of Julia’s extension mechanism. These are detailed in Box 1.

In this manuscript, we describe provide a high-level overview of the functionalities of the package(s) forming SDT. We then discuss design principles that facilitate long-term maintenance, development, and integration. We finish by presenting four illustrative case studies: extraction of data at known species occurrences, manipulation of multiple geospatial layers, training and explanation of a SDM, and creation of virtual communities to simulate the spatial distribution of ecological uniqueness. This later case study is intended to provide an impression of what using SDT as a support for the development of novel analyses feels like. All of the case studies are available as supplementary material, in the form of fully reproducible, self-contained Jupyter notebooks.

Application description

SpeciesDistributionToolkit is released as a package for the Julia programming language (Bezanson et al. 2017). It is licensed under the open-source initiative approved MIT license. It has evolved from a previous collection of packages to handle GBIF and raster data (Dansereau and Poisot 2021), and now provides extended functionalities as well as improved performance. The package is registered in the Julia package repository and can be downloaded and installed anonymously. It is compatible with the current long-term support (LTS) release of Julia. The full source code, complete commit history, plans for future development, and a forum, are available at https://github.com/PoisotLab/SpeciesDistributionToolkit.jl. This page additionally has a link to the documentation, containing a full reference for the package functions, a series of briefs how-to examples, and longer vignettes showcasing more integrative tutorials.

An overview of the SDT package is given in Figure 1. The project is organized as a “monorepo”, in which separate but interoperable packages (meaning that they can be installed independently, but are designed to work cohesively) reside. This allows expanding the scope of the package by moving functionalities into new component packages, without requiring interventions from users. As SDT is registered in the Julia package repository, it can be installed by using add SpeciesDistributionToolkit when in package mode at the Julia prompt. When loading the SDT package with using SpeciesDistributionToolkit, all component packages are automatically and transparently loaded. Therefore, users do not need to know where a specific method or function resides to use it. The monorepo structure has an important advantage for users: the code of all component packages can be found in the same location, and it makes inspecting the internal implementation of any package easier. In addition, users can open an issue describing a problem or desired feature within the monorepo, without needing to understand which component package is the right target for this issue. This both decreases barriers to interact with the software, while also facilitating the work of contributors who can look at all the issues to address in a centralized way. Similarly, monorepo lend themselves to integrated documentation, which is the approach we have chosen with the online SDT manual.

Figure 1: Overview of the packages included in SpeciesDistributionToolkit. The packages are color-coded by intended use (acquisition, representation, and analysis of data). The specific content of each package is presented in the main text. Note that because the package relies on interfaces to facilitate code interoperability, there are only three dependency relationships (black arrows). Some packages can interact with data sources, represented on the left side of the figure. When loading SpeciesDistributionToolkit, all public methods from the package are accessible to the user. Packages that are supported through extensions are in dashed boxes.

SDT uses the built-in Julia package manager to keep all dependencies up to date. Furthermore, we use strict semantic versioning: major versions correspond to changes that would break user-developped code; minor versions represent additional functionalities; patch releases cover minor bug fixes or documentation changes. All component packages are versioned independently, and have their own CHANGELOG file documenting each release. This strict reliance on semantic versioning removes the issues of maintaining compatibility when new functionalities are added: all releases in the v1.x.x branch of SDT depend on component packages in their respective v1.x.x branch, and users can benefit from new functionalities without needing to adapt existing code. This behavior is extensively tested, both through unit tests and through integration testing generated as part of the online documentation.

Component packages

The SDT package primarily provides integration between the other packages via method overloading (reusing method names for intuitive and concise code), allowing to efficiently join packages together (Roesch et al. 2023). Additional functionalities that reside in the top-level package are the generation of pseudo-absences (Barbet-Massin et al. 2012), access to the gadm.org database, handling of polygon data and zonal statistics, and various quality of life methods. Because of the modular nature of the code, any of these functions can be transparently moved to their own packages without affecting reproducibility. Note that all packages can still be installed (and would be fully functional) independently.

The SimpleSDMLayers package offers a series of types to represent raster data in arbitrary projections defined by a proj string (Evenden et al. 2024). This package provides the main data representation for most spatial functionalities that SDT supports, and handles saving and loading data. It also contains utility functions to deal with raster data, including interpolation to different spatial grids and CRS, rescaling and quantization of data, masking, and most mathematical operations that can be applied to rasters.

OccurrencesInterface is a light-weight package to provide a common interface for occurrence data. It implements abstract and concrete types to define a single occurrence and a collection thereof, and a series of methods allowing any occurrence data provider (e.g. GBIF) or data representation to become fully interoperable with the rest of SDT. All SDT methods that handle occurrence data do so through the interface provided by the OccurrencesInterface package, allowing future data sources to be integrated without the need for new code.

The GBIF package offers access to the gbif.org streaming API (GBIF: The Global Biodiversity Information Facility 2025), including the ability to retrieve, filter, and restart downloads. Although this package provides a rich data representation for occurrence data when access to the full GBIF data schema is required, all the objects it returns adhere to the OccurrencesInterface interface. The package also offers the functionality to download datasets from GBIF using their DOI.

SimpleSDMDatasets implements an interface to retrieve and locally store raster data, which can be extended by users to support additional data sources. Tt offers access to a series of common data sources for spatial biodiversity modeling, including the biodiversity mapping project (Jenkins, Pimm, and Joppa 2013), the EarthEnv collection for land cover (Tuanmu and Jetz 2014) and habitat heterogeneity (Tuanmu and Jetz 2015), Copernicus land cover 100m data (Buchhorn et al. 2020), PaleoClim (Brown et al. 2018) data, WorldClim 1 and 2 (Fick and Hijmans 2017) and CHELSA 1 and 2 (Karger et al. 2017) and their projections under various RCPs and SSPs.

Phylopic offers a wrapper around the phylopic.org API to download silhouettes for taxonomic entities. It also provides utilities for citation of the downloaded images. Its functionalities are similar to the rphylopic package (Gearty and Jones 2023).

Fauxcurrences is inspired by the work of Osborne et al. (2022), and allows generating a series of simulated occurrence data that have the same statistical structure as observed ones. The package supports multi-species data, with user-specified weights for conserving intra and inter-specific occurrence distances.

Finally, SDeMo provides a high-level interface to the training, validation, and interpretation of species distribution modeling. By providing a series of data transformation (PCA, Whitening, z-score) and classifiers (currently BIOCLIM, Naive Bayes, logistic regression, and decision trees), it offers the basic elements to demonstrate training and evaluation of SDMs, as well as techniques related to heterogeneous ensembles and bagging with support for arbitrary consensus (Marmion et al. 2009) and voting (Drake 2014) functions. SDeMo promotes the use of interpretable techniques: the package supports regular (Elith et al. 2005) and inflated (Zurell, Elith, and Schröder 2012) partial responses, as well as the calculation and mapping of Shapley values (Wadoux, Saby, and Martin 2023; Mesgaran, Cousens, and Webber 2014) using the standard Monte-Carlo approach (Mitchell et al. 2021). Counterfactuals (Van Looveren and Klaise 2019; Karimi et al. 2019), representing perturbation of the input data leading to the opposite prediction (i.e. “what environmental conditions would lead to the species being absent”) can also be generated. The API of SDeMo has been designed to (i) enforce the use of best practices, and (ii) be consistent across analyses, so that the package can be used for educational material. Because SDeMo is a generic interface to any predicitve model, users can expand it by adding additional packages. This can be done either through a contribution to the SDT repository, or as part of the code written by users for a specific analysis.

TODO CITE BOOTH2014 BIOCLIM

TODO PSEUDOABSENCES

TODO POLYGONS

Case studies

In this section, we provide a series of case studies to illustrate the use of the package. The on-line manual offers longer tutorials, as well as a series of how-to vignettes to illustrate the full scope of what the package allows. As the notebooks accompanying this article cover the full code required to run these case studies, we do not present code snippets in the main text (as they are presented with detailed explanations in the Supp. Mat.), but rather focus on explaining how the component packages work together in each example.

Landcover consensus map

In this case study (Supp. Mat. 1), we retrieve the land cover data from Tuanmu and Jetz (2014), clip them to a GeoJSON polygon describing the country of Paraguay (SDT can download data directly from gadm.org), and apply the mosaic operation to figure out which class is the most locally abundant. This case study uses the SimpleSDMDatasets package to download (and locally cache) the raster data, as well as the SimpleSDMLayers package to provide basic utility functions on raster data. The results are presented in Figure 2.

Figure 2: Land cover consensus (defined as the class with the strongest local representation) in the country of Paraguay. Only the classes that were most abundant in at least one pixel are represented. The code to produce this figure is available as Supp. Mat. 2.

SimpleSDMDatasets uses local storage of raster data for future use, to avoid re-downloading data upon repeated use. The location of the data is (i) standardized by the package itself, making the file findable to humans, and (ii) changeable by the user to, e.g., store the data within the project folder rather than in a central location. As much as possible, SDT will only read the part of the raster data that is required given the region of interest to the user. This is done by providing additional context in the form of a bounding box (in WGS84, regardless of the underlying raster data projection, in line with the GeoJSON specification). SDT has methods to calculate the bounding box for all the objects it supports.

Using data from GBIF

SDT provides strong integration between data on species occurrences and source of geospatial information. To illustrate this, we will collect data on the distribution of Akodon montensis (Rodentia, family Cricetidae), a known host of orthohantaviruses (Burgos et al. 2021; Owen et al. 2010), in Paraguay. In Supp. Mat. 2 we (i) request occurrence data using the GBIF package, (ii) download the silhouette of the species through Phylopic, and (iii) extract temperature and precipitation data at the points of occurrence based on bioclimatic data layers. The results are presented in Figure 3. The full notebook includes information about basic operations on raster data, as well as extraction of data based on occurrence records.

Figure 3: Relationship between temperature and precipitation (BIO1 and BIO12) at each georeferenced occurrence known to GBIF for Akodon montensis. The code to produce this figure is available as Supp. Mat. 1.

In practice, although the data are retrieved using the GBIF package, they are used internally by SDT through the OccurrencesInterface package. This package defines a small convention to handle georeferenced occurrence data, and allows to transparently integrate additional occurrence sources. By defining a handful of methods for a custom data type, or by using the convertes built into the package, users can plug-in any occurrence data source or csv file, and enjoy full compatibility with the entire SDT functionalities.

Training a species distribution model

In this case study, we illustrate the integration of SDeMo and SimpleSDMLayers to train a species distribution model. Specifically, we re-use the data from Figure 3, with additional layers of bioclimatic variables. We train a rotation forest (Bagnall et al. 2018), an homogeneous ensemble of PCA followed by decision trees where each model has a subset of features and training data. The results are presented in Figure 4. The model is built by selecting an optimal suite of BioClim variables, then predicted in space, and the resulting predicted species range is finally clipped by the elevational range observed in the occurrence data. The data transformations in SDeMo are always applied in a way that prevents the possibility of data leakage (Stock, Gregr, and Chan 2023).

TODO add splits are balanced by default (but also add PA package)

TODO CITE BOOTH 2024 CHECKING VARIABLES + Plot partial responses to see if artifacts

Figure 4: Predicted range of Akodon montensis in Paraguay based on a rotation forest trained on GBIF occurrences and the BioClim variables. The predicted range is clipped to the elevational range of the species. The code to produce this figure is available as Supp. Mat. 3.

The full notebook (Supp. Mat. 3) has additional information on routines for variable selection, stratified cross-validation, as well as the construction of the ensemble from a single PCA and decision tree. In addition, we report in Figure 5 the partial and inflated partial responses to the most important variable (highlighting an interpretable effect of the variable in the model), as well as the (Monte-Carlo) Shapley values (Wadoux, Saby, and Martin 2023; Mitchell et al. 2021) for each prediction in the training set. Because SDeMo works through generic functions, these methods can be applied to any model specified by the user. In practice, generic purpose ML frameworks Julia, notably MLJ (Blaom et al. 2020), can also be used and interfaced with SDT by using the classifier and transformer interface.

Figure 5: Partial responses (red) and inflated partial responses (grey) to the most important variable. In addition, the Shapley values for all training data are presented in the same figure; green points are presences, and pale points are pseudo-absences. Shapley values were added to the average model prediction to be comparable to partial responses. The code to produce this figure is available as Supp. Mat. 3.

Species and location contribution to beta diversity

In the final case study (Supp. Mat. 4), we simulate the distribution of virtual species (Hirzel, Helfer, and Metral 2001) with a logistic response to two environmental covariate (Leroy et al. 2016). We then use this simulated sample to perform the decomposition of \(\beta\)-diversity introduced by Legendre and De Cáceres (2013) and applied by Dansereau, Legendre, and Poisot (2022) to spatially continuous data. This simulates the potential distribution of hotspots and coldspots of ecological uniqueness. The results are presented in Figure 6.

Figure 6: Virtual distribution of normalized (mean of 0 and unit variance) locality contribution to beta-diversity (Legendre and De Cáceres 2013), based on a pool of 100 virtual species. The inset histogram represents the standardized species contribution to beta-diversity. Red areas represent comparatively more unique areas in terms of simulated species composition. The code to produce this figure is available as Supp. Mat. 4.

Because the layers used by SDT are broadcastable, we can rapidly apply a function (here, the logistic response to the environmental covariate) to each layer, and then multiply the suitabilities together. The last step is facilitated by the fact that most basic arithmetic operations are defined for layers, allowing for example to add, multiply, substract, and divide them by one another.

Conclusion

We have presented SpeciesDistributionToolkit, a package for the Julia programming language aiming to facilitate the collection, curation, analysis, and visualisation of data commonly used in species distribution modeling. Through the use of interfaces and a modular design, we have made this package robust to changes, easy to add functionalities to, and well integrated to the rest of the Julia ecosystem. All code for the case studies can be found in Supp. Mat. 1-4.

Plans for active development of the package are focused on (i) additional techniques for pseudo-absence generations, likely leading to their separate component package, (ii) full compatibility with the MultivariateStatistics for transformation, and (iii) additional SDeMo functionalities to allow cross-validation techniques with biologically relevant structure (Roberts et al. 2017).

Acknowledgements: TP is funded by an NSERC Discovery grant, a Discovery Acceleration Supplement grant, and a Wellcome Trust grant (223764/Z/21/Z). MDC is funded by an IVADO Postdoctoral Fellowship.

NoteBox 1 - integration with other Julia packages

The SDT package benefits from close integration with other packages in the Julia universe. Notably, this includes Makie [including GeoMakie; Danisch and Krumbiegel (2021)] for plotting and interactive data visualisation: all relevant plot types are overloaded for layer and occurrence data. Most data handled by SDT can be exported using the Tables interface, which allows data to be consumed by other packages like DataFrames (Bouchet-Valat and Kamiński 2023) and MLJ (Blaom et al. 2020), or directly saved as csv files. Interfaces to internal Julia methods are implemented whenever they are pertinent. SimpleSDMLayers and OccurrencesInterface objects behave like arrays, are iterable, and broadcastable. The SDeMo package relies in part on the StatsAPI interface, allowing to easily define new data transformation and classifier types to support additional features. Achieving integration with other packages through method overloading and the adherence to well-established interfaces is important, as it increases the chances that additional functionalities external to SDT can be used directly or fully supported with minimal addition of code. For situations where interfaces are not sufficient to link with other packages, we rely on Julia’s extension mechanism. For instance, SimpleSDMLayers objects can be used with Clustering, MultivariateStats, as well as SpatialBoundaries (Strydom and Poisot 2023), with strict version bounds, ensuring that this integration will remain usable regardless of possible changes in external packages.

References

Araújo, Miguel B, Robert P Anderson, A Márcia Barbosa, Colin M Beale, Carsten F Dormann, Regan Early, Raquel A Garcia, et al. 2019. “Standards for Distribution Models in Biodiversity Assessments.” Science Advances 5 (January): eaat4858. https://doi.org/10.1126/sciadv.aat4858.
Arenas-Castro, Salvador, Adrián Regos, Ivone Martins, João Honrado, and Joaquim Alonso. 2022. “Effects of Input Data Sources on Species Distribution Model Predictions Across Species with Different Distributional Ranges.” Journal of Biogeography 49 (July): 1299–1312. https://doi.org/10.1111/jbi.14382.
Bagnall, A, M Flynn, J Large, J Line, A Bostrom, and G Cawley. 2018. “Is Rotation Forest the Best Classifier for Problems with Continuous Features?” arXiv [Cs.LG], September.
Baker, David J, Ilya M D Maclean, Martin Goodall, and Kevin J Gaston. 2021. “Species Distribution Modelling Is Needed to Support Ecological Impact Assessments.” The Journal of Applied Ecology 58 (January): 21–26. https://doi.org/10.1111/1365-2664.13782.
Barbet-Massin, Morgane, Frédéric Jiguet, Cécile Hélène Albert, and Wilfried Thuiller. 2012. “Selecting Pseudo‐absences for Species Distribution Models: How, Where and How Many?: How to Use Pseudo-Absences in Niche Modelling?” Methods in Ecology and Evolution 3 (April): 327–38. https://doi.org/10.1111/j.2041-210x.2011.00172.x.
Bezanson, Jeff, Alan Edelman, Stefan Karpinski, and Viral B Shah. 2017. “Julia: A Fresh Approach to Numerical Computing.” SIAM Review. Society for Industrial and Applied Mathematics 59 (January): 65–98. https://doi.org/10.1137/141000671.
Blaom, Anthony, Franz Kiraly, Thibaut Lienart, Yiannis Simillides, Diego Arenas, and Sebastian Vollmer. 2020. MLJ: A Julia Package for Composable Machine Learning.” Journal of Open Source Software 5 (November): 2704. https://doi.org/10.21105/joss.02704.
Booth, Trevor H. 2022. “Checking Bioclimatic Variables That Combine Temperature and Precipitation Data Before Their Use in Species Distribution Models.” Austral Ecology 47: 1506–14. https://doi.org/10.1111/aec.13234.
Bouchet-Valat, Milan, and Bogumi Kamiński. 2023. DataFrames.jl: Flexible and Fast Tabular Data in Julia.” Journal of Statistical Software 107. https://doi.org/10.18637/jss.v107.i04.
Brown, Jason L, Daniel J Hill, Aisling M Dolan, Ana C Carnaval, and Alan M Haywood. 2018. PaleoClim, High Spatial Resolution Paleoclimate Surfaces for Global Land Areas.” Scientific Data 5 (November): 180254. https://doi.org/10.1038/sdata.2018.254.
Buchhorn, Marcel, Bruno Smets, Luc Bertels, Bert De Roo, Myroslava Lesiv, Nandin-Erdene Tsendbazar, Martin Herold, and Steffen Fritz. 2020. “Copernicus Global Land Service: Land Cover 100m: Collection 3: Epoch 2019: Globe.” Zenodo. https://doi.org/10.5281/ZENODO.3939050.
Burgos, E F, M V Vadell, C M Bellomo, V P Martinez, O D Salomon, and I E Gómez Villafañe. 2021. “First Evidence of Akodon-Borne Orthohantavirus in Northeastern Argentina.” EcoHealth 18 (December): 429–39. https://doi.org/10.1007/s10393-021-01564-6.
Danisch, Simon, and Julius Krumbiegel. 2021. “Makie.jl: Flexible High-Performance Data Visualization for Julia.” Journal of Open Source Software 6 (September): 3349. https://doi.org/10.21105/joss.03349.
Dansereau, Gabriel, Pierre Legendre, and Timothée Poisot. 2022. “Evaluating Ecological Uniqueness over Broad Spatial Extents Using Species Distribution Modelling.” Oikos (Copenhagen, Denmark) 2022 (May): e09063. https://doi.org/10.1111/oik.09063.
Dansereau, Gabriel, and Timothée Poisot. 2021. SimpleSDMLayers.jl and GBIF.jl: A Framework for Species Distribution Modeling in Julia.” Journal of Open Source Software 6 (January): 2872. https://doi.org/10.21105/joss.02872.
Drake, John M. 2014. “Ensemble Algorithms for Ecological Niche Modeling from Presence‐background and Presence‐only Data.” Ecosphere (Washington, D.C) 5 (June): 1–16. https://doi.org/10.1890/es13-00202.1.
Elith, Jane, Simon Ferrier, Falk Huettmann, and John Leathwick. 2005. “The Evaluation Strip: A New and Robust Method for Plotting Predicted Responses from Species Distribution Models.” Ecological Modelling 186 (August): 280–89. https://doi.org/10.1016/j.ecolmodel.2004.12.007.
Elith, Jane, and John R Leathwick. 2009. “Species Distribution Models: Ecological Explanation and Prediction Across Space and Time.” Annual Review of Ecology, Evolution, and Systematics 40 (December): 677–97. https://doi.org/10.1146/annurev.ecolsys.110308.120159.
Evenden, Gerald I, Even Rouault, Frank Warmerdam, Kristian Evers, Thomas Knudsen, Howard Butler, Mike W Taves, et al. 2024. PROJ.” Computer software. Zenodo. https://doi.org/10.5281/ZENODO.5884394.
Fick, Stephen E, and Robert J Hijmans. 2017. WorldClim 2: New 1‐km Spatial Resolution Climate Surfaces for Global Land Areas: NEW CLIMATE SURFACES FOR GLOBAL LAND AREAS.” International Journal of Climatology: A Journal of the Royal Meteorological Society 37 (October): 4302–15. https://doi.org/10.1002/joc.5086.
GBIF: The Global Biodiversity Information Facility. 2025. What Is GBIF? 2025.
Gearty, William, and Lewis A Jones. 2023. “Rphylopic: An R Package for Fetching, Transforming, and Visualising PhyloPic Silhouettes.” Methods in Ecology and Evolution 14 (November): 2700–2708. https://doi.org/10.1111/2041-210x.14221.
Gonzalez, Andrew, Petteri Vihervaara, Patricia Balvanera, Amanda E Bates, Elisa Bayraktarov, Peter J Bellingham, Andreas Bruder, et al. 2023. “A Global Biodiversity Observing System to Unite Monitoring and Guide Action.” Nature Ecology & Evolution, August, 1–5. https://doi.org/10.1038/s41559-023-02171-0.
Griffith, Jory, Jean-Michel Lord, Michael D Catchen, Maria Isabel Arce-Plata, Manuel Fernandez Galvez Bohorquez, Matusan Chandramohan, María Camilla Diaz-Corzo, et al. 2024. BON in a Box: An Open and Collaborative Platform for Biodiversity Monitoring, Indicator Calculation, and Reporting,” October. https://doi.org/10.32942/X2M320.
Guisan, Antoine, Reid Tingley, John B Baumgartner, Ilona Naujokaitis-Lewis, Patricia R Sutcliffe, Ayesha I T Tulloch, Tracey J Regan, et al. 2013. “Predicting Species Distributions for Conservation Decisions.” Ecology Letters 16 (December): 1424–35. https://doi.org/10.1111/ele.12189.
Hirzel, A H, V Helfer, and F Metral. 2001. “Assessing Habitat-Suitability Models with a Virtual Species.” Ecological Modelling 145 (November): 111–21. https://doi.org/10.1016/s0304-3800(01)00396-9.
Jenkins, Clinton N, Stuart L Pimm, and Lucas N Joppa. 2013. “Global Patterns of Terrestrial Vertebrate Diversity and Conservation.” Proceedings of the National Academy of Sciences of the United States of America 110 (July): E2602–10. https://doi.org/10.1073/pnas.1302251110.
Karger, Dirk Nikolaus, Olaf Conrad, Jürgen Böhner, Tobias Kawohl, Holger Kreft, Rodrigo Wilber Soria-Auza, Niklaus E Zimmermann, H Peter Linder, and Michael Kessler. 2017. “Climatologies at High Resolution for the Earth’s Land Surface Areas.” Scientific Data 4 (September): 170122. https://doi.org/10.1038/sdata.2017.122.
Karimi, Amir-Hossein, Gilles Barthe, Borja Balle, and Isabel Valera. 2019. “Model-Agnostic Counterfactual Explanations for Consequential Decisions.” arXiv [Cs.LG], May.
Kass, Jamie M, Adam B Smith, Dan L Warren, Sergio Vignali, Sylvain Schmitt, Matthew E Aiello-Lammens, Eduardo Arlé, et al. 2024. “Achieving Higher Standards in Species Distribution Modeling by Leveraging the Diversity of Available Software.” Ecography, November. https://doi.org/10.1111/ecog.07346.
Kellner, Kenneth F, Jeffrey W Doser, and Jerrold L Belant. 2025. “Functional R Code Is Rare in Species Distribution and Abundance Papers.” Ecology 106 (January): e4475. https://doi.org/10.1002/ecy.4475.
Legendre, Pierre, and Miquel De Cáceres. 2013. “Beta Diversity as the Variance of Community Data: Dissimilarity Coefficients and Partitioning.” Ecology Letters 16 (August): 951–63. https://doi.org/10.1111/ele.12141.
Leroy, Boris, Christine N Meynard, Céline Bellard, and Franck Courchamp. 2016. “Virtualspecies, an R Package to Generate Virtual Species Distributions.” Ecography 39 (June): 599–607. https://doi.org/10.1111/ecog.01388.
Marmion, Mathieu, Miia Parviainen, Miska Luoto, Risto K Heikkinen, and Wilfried Thuiller. 2009. “Evaluation of Consensus Methods in Predictive Species Distribution Modelling.” Diversity & Distributions 15 (January): 59–69. https://doi.org/10.1111/j.1472-4642.2008.00491.x.
McShea, William J. 2014. “What Are the Roles of Species Distribution Models in Conservation Planning?” Environmental Conservation 41 (June): 93–96. https://doi.org/10.1017/s0376892913000581.
Mesgaran, Mohsen B, Roger D Cousens, and Bruce L Webber. 2014. “Here Be Dragons: A Tool for Quantifying Novelty Due to Covariate Range and Correlation Change When Projecting Species Distribution Models.” Diversity & Distributions 20 (October): 1147–59. https://doi.org/10.1111/ddi.12209.
Mitchell, Rory, Joshua Cooper, Eibe Frank, and Geoffrey Holmes. 2021. “Sampling Permutations for Shapley Value Estimation.” arXiv [Stat.ML], April.
Osborne, Owen G, Henry G Fell, Hannah Atkins, Jan van Tol, Daniel Phillips, Leonel Herrera-Alsina, Poppy Mynard, et al. 2022. “Fauxcurrence: Simulating Multi‐species Occurrences for Null Models in Species Distribution Modelling and Biogeography.” Ecography 2022 (July): e05880. https://doi.org/10.1111/ecog.05880.
Owen, Robert D, Douglas G Goodin, David E Koch, Yong-Kyu Chu, and Colleen B Jonsson. 2010. “Spatiotemporal Variation in Akodon Montensis (Cricetidae: Sigmodontinae) and Hantaviral Seroprevalence in a Subtropical Forest Ecosystem.” Journal of Mammalogy 91 (April): 467–81. https://doi.org/10.1644/09-MAMM-A-152.1.
Roberts, David R, Volker Bahn, Simone Ciuti, Mark S Boyce, Jane Elith, Gurutzeta Guillera-Arroita, Severin Hauenstein, et al. 2017. “Cross-Validation Strategies for Data with Temporal, Spatial, Hierarchical, or Phylogenetic Structure.” Ecography 40 (August): 913–29. https://doi.org/10.1111/ecog.02881.
Roesch, Elisabeth, Joe G Greener, Adam L MacLean, Huda Nassar, Christopher Rackauckas, Timothy E Holy, and Michael P H Stumpf. 2023. “Julia for Biologists.” Nature Methods 20 (May): 655–64. https://doi.org/10.1038/s41592-023-01832-z.
Stock, Andy, Edward J Gregr, and Kai M A Chan. 2023. “Data Leakage Jeopardizes Ecological Applications of Machine Learning.” Nature Ecology & Evolution 7 (November): 1743–45. https://doi.org/10.1038/s41559-023-02162-1.
Strydom, Tanya, and Timothée Poisot. 2023. SpatialBoundaries.jl: Edge Detection Using Spatial Wombling.” Ecography 2023 (May). https://doi.org/10.1111/ecog.06609.
Tuanmu, Mao-Ning, and Walter Jetz. 2014. “A Global 1‐km Consensus Land‐cover Product for Biodiversity and Ecosystem Modelling: Consensus Land Cover.” Global Ecology and Biogeography: A Journal of Macroecology 23 (September): 1031–45. https://doi.org/10.1111/geb.12182.
———. 2015. “A Global, Remote Sensing‐based Characterization of Terrestrial Habitat Heterogeneity for Biodiversity and Ecosystem Modelling: Global Habitat Heterogeneity.” Global Ecology and Biogeography: A Journal of Macroecology 24 (November): 1329–39. https://doi.org/10.1111/geb.12365.
Van Looveren, Arnaud, and Janis Klaise. 2019. “Interpretable Counterfactual Explanations Guided by Prototypes.” arXiv [Cs.LG], July.
Wadoux, Alexandre M J-C, Nicolas P A Saby, and Manuel P Martin. 2023. “Shapley Values Reveal the Drivers of Soil Organic Carbon Stock Prediction.” SOIL 9 (January): 21–38. https://doi.org/10.5194/soil-9-21-2023.
Zurell, Damaris, Jane Elith, and Boris Schröder. 2012. “Predicting to New Environments: Tools for Visualizing Model Behaviour and Impacts on Mapped Distributions.” Diversity & Distributions 18 (June): 628–34. https://doi.org/10.1111/j.1472-4642.2012.00887.x.
Zurell, Damaris, Janet Franklin, Christian König, Phil J Bouchet, Carsten F Dormann, Jane Elith, Guillermo Fandos, et al. 2020. “A Standard Protocol for Reporting Species Distribution Models.” Ecography 43 (September): 1261–77. https://doi.org/10.1111/ecog.04960.